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The framework and specifications for the National 


Assessment of Educational Progress (NAEP) science assessment were 
developed in 1991-92 and field tested in 1993. The assessment was 
postponed, however, and will be administered in 1996. The framework 
calls for performance-based tasks that probe students’ abilities to 
use materials to make observations, perform investigations, evaluate 
experimental results, and apply problem-solving skills, as well as 
constructed response and multiple choice items that explore student 
abilities. The core of the science framework consists of a 
three-by-three matrix that describes earth, physical, and life 
science and conceptual understanding, scientific investigation, and 
practical reasoning. The hands-on tasks were designed and developed 
in prepackaged kits for administration to large numbers of students. 
The field tests demonstcated that, given the constraints and 
challenges of task development, those that were selected for the 
national assessment did meet the test specifications. Three tables 
illustrate test characteristics. (SLD) 
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Introduction 


The National Assessment of Educational Progress (NAEP) is the government’s 
primary indicator of what students at three grade/age levels, grade 4/age 9, grade 8/age 13, 
and grade 12/age 17, know and can do in different scholastic areas. In addition to questions in 
specific academic areas, NAEP also gathers background data on student, teacher, and school 
background variables that may be related to subject-area performance. 

The framework and specifications for the NAEP science assessment were developed in 
1991-1992 under the auspices of the National Assessment Governing Board through a 
consensus process managed by the Council of Chief State School Officers. The questions for 
the assessment were field tested in 1993 and based on the results, the science assessment was 
assembled for administration in 1994. The assessment, however, was postponed and will be 
administered in 1996. 


The Science Framework and Specifications 


The science framework is based on a twofold view: that scientific knowledge should 
be organized in a structure that connects discrete pieces of information in a meaningful way, 
and-that science proficiency depends on a student's ability to know and integrate facts into 
larger concepts and themes using the tools, procedures, and reasoning processes of science. 
As an outgrowth of this view, the framework calls for NAEP’s science assessment to include: 


performance-based tasks that probe students’ abilities to use materials to make 
observations, perform investigations, evaluate experimental results, and apply 
problem-solving skills (no less than 30 percent of the assessment time), and 


constructed-response and multiple-choice questions that explore students’ 
abilities to explain, integrate, apply, reason, plan, design, evaluate, and 
communicate (no more than 70 percent of the assessment time) 


The core of the science framework consists of a three-by-three matrix that describes 
three major fields of science; earth, physical, and life, and ihree elements of knowing and 
doing science; conceptual understanding, scientific investigation, and practical reasoning. In 
addition to these main dimensions, the framework includes two additional categories that 
describe science - the nature of science (which includes technology) and the organizing 
themes of science (models, systems, and patterns of change). The framework can be 
summarized as shown in Table 1. 


Table 1 - SCIENCE FRAMEWORK 


, 
FIELDS OF SCIENCE 


Themes . 
Models, Systems, Patterns of Change 


Table 2 shows the framework in terms of recommended assessment time. 


Table 2 - NAEP SCIENCE ASSESSMENT FRAMEWORK SUMMARY 
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| Performance-based Not less than | Not Jess than | Not less than 
| 30% 30% 30% 
Matec Not more than | Not more than | Not more than 
50% 50% 50% 
| short constructed-response Not less than | Not less than | Not less than 
33% 33% 33% 
| Extended constructed-response Not less than | Not less than | Not less than 
| 17% 17% 17% 


Development of Hands-on Tasks 


A number of challenges and constraints faced test developers, science specialists, 
scoring specialists, and measurement specialists during the development of hands-on tasks. 
Briefly o challenges included 

the large number of students who could be assessed at one time 

the location of the assessment 

cost of materials 

standardization of equipment and chemicals 

shelf-life of chemicals 

reading burden (appropriate reading level) 

questions that were independent of one another 

questions that could be scored reliably 

scoring guides that were comprehensive allowing for non-standard answers and 
tasks thai met the objectives of the science framework 


The constraints included 

' safety regulations of each state 
no toxic or corrosive chemicals 
no live organisms (except dormant) and 
no equipment requiring an electrical outlet 


The assessment is often administered to fairly large groups of students at one time 
(30-100 students) in settings such as school cafeterias with only one or two assessment 
supervisors. Thus, the types of materials and equipment that can be used by students in their 
tasks is somewhat limited. All materials and equipment for each task had to be included in a 
pre-packaged kit of manageable size. The cost of the kit for each student had to be no more 
than $10, and preferably a lot lower. In addition, because materials had to be shipped to 
assessment supervisors well in advance of the assessment, and to areas of different 
temperatures (the assessment is administered in January and February), no task could involve 
materials that had a limited shelf-life, or were influenced by temperature changes. Also no 
task could include live organisms (with the exception of dormant material), toxic or corrosive 
chemicals, flames or other heat source, or equipment requiring an electrical outlet (although 
batteries could be used). 

In addition to these considerations, the tasks themselves had to contain enough 
information for students to do them without them asking questions of the administrator, who 
was only allowed to ascertain that students had the correct materials and equipment. A 
general framework for all tasks was developed and task developers worked within this 
framework. Specifically, each task has introductory information explaining briefly what is to 
be accomplished in the task; a diagram of the materials and equipment in each packet; 
directions for accomplishing the task written in language that is developmentally appropriate; 
and questions, both multiple choice and constructed response. Whilst this shell is standard, 
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there are differences in the way the directions and questions are scaffolded. Some tasks are 
carefully scaffolded, students being prompted to write down responses as they proceed 
through the task; in others the students have to complete the task and then answer the 
questions; a third type presents a problem that students have to solve given certain materials 
and equipment. 

Each task that was eventually selected for field testing had undergone rigorous pre- 
testing. The tasks were first administered on a one-to-one basis and then piloted in various 
schools. Students were questioned at each stage about the appropriateness and clarity of 
language and also monitored closely whilst the tasks were performed. 


The 1993 Science Field Test 


The purpose of the 1993 field test was to administer a large set of items and tasks so 
that those with the best statistical properties could be selected for the 1994 assessment. To 
obtain the exercises for the operational science assessment, approximately twice as many 
exercises as were needed were field tested at grade four. At grades eight and twelve, 
approximately 40 percent more exercises than were needed were field tested. In total, the 
1993 science field test contained 673 items, of which 450 were constructed-response items. 

Table 3 shows the number of blocks of items field tested. Each block at grade 4 
consists of set of questions that take 20 minutes to complete and each block at grades 8 and 
12 consists of a set of questio.is that take 30 minutes to complete. 


Table 3 - NUMBER OF BLOCKS IN THE SCIENCE FIELD TEST 
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Challenges of Scoring 


Four hundred and fifty constructed-response questions were scored at National 
Computer Systems (NCS) in Iowa City. Five hundred students answered each question. 
Scoring guides for each question had been prepared during the development of the questions 
and tasks, however, these did have to be revised when student responses were seen. The 
scoring guides were analytical in nature and usually consisted of 3 or 4 points. Prior to the 
scoring, training packets made up of anchor papers, practice papers, qualification papers, and 
calibration papers were assembled from student responses. Four tables of ten scorers took 
seven weeks to score the questions. Each table had a trainer who was responsible for training 
the scorers and a table leader who was responsible for back reading and making certain that 
the scorers were scoring according to the scoring guide. The constructed-response questions 
in each block of items were trained as a unit, then the scorers scored each students block of 
questions. The questions in the hands-on tasks were mostly constructed-response and 
presented their own special scoring challenges, for example, when conclusions based on 
results were asked for and the results were not the expected ones, judgements had to be made 
about the plausibility of the conclusions based on the discrepant results. The scorers always 
did the tasks prior to scoring them. This enabled them to see that 


the equipmert and materials did not always behave as expected 
there were not always right answers 


Despite the challenges, interrater reliability was for the most part excellent - above 
80% and in the majority of questions above 90%. 


Conclusion 


Given the constraints and challenges that were encountered during the development of 
the hands-on tasks, those that were selected for the national assessment do meet the 
requirements stipulated in the science framework. Specifically they "probe students’ abilities 
to use materials to make observations, perform investigations, evaluate experimental results, 
and apply problem-solving skills.” 


Reference 
National Assessment Governing Board, Science Framework for the 1994 National Assessment 
of Educational Progress (no date, pre-publication draft; 202-357-6938). 
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